Presenting a comprehensive multi-scale evaluation framework for participatory modelling programs: A scoping review

Introduction Systems modelling and simulation can improve understanding of complex systems to support decision making, better managing system challenges. Advances in technology have facilitated accessibility of modelling by diverse stakeholders, allowing them to engage with and contribute to the development of systems models (participatory modelling). However, despite its increasing applications across a range of disciplines, there is a growing need to improve evaluation efforts to effectively report on the quality, importance, and value of participatory modelling. This paper aims to identify and assess evaluation frameworks, criteria, and/or processes, as well as to synthesize the findings into a comprehensive multi-scale framework for participatory modelling programs. Materials and methods A scoping review approach was utilized, which involved a systematic literature search via Scopus in consultation with experts to identify and appraise records that described an evaluation framework, criteria, and/or process in the context of participatory modelling. This scoping review is registered with the Open Science Framework. Results The review identified 11 studies, which varied in evaluation purposes, terminologies, levels of examination, and time points. The review of studies highlighted areas of overlap and opportunities for further development, which prompted the development of a comprehensive multi-scale evaluation framework to assess participatory modelling programs across disciplines and systems modelling methods. The framework consists of four categories (Feasibility, Value, Change/Action, Sustainability) with 30 evaluation criteria, broken down across project-, individual-, group- and system-level impacts. Discussion & conclusion The presented novel framework brings together a significant knowledge base into a flexible, cross-sectoral evaluation effort that considers the whole participatory modelling process. Developed through the rigorous synthesis of multidisciplinary expertise from existing studies, the application of the framework can provide the opportunity to understand practical future implications such as which aspects are particularly important for policy decisions, community learning, and the ongoing improvement of participatory modelling methods.


Introduction
We live in a complex world with challenges that affect all aspects of our lives. Systems modelling and simulation can improve understanding and management of complex systems challenges. Advances in technology have facilitated accessibility of modelling by diverse stakeholders, allowing them to engage with and contribute to the development of systems models (participatory modelling). However, despite its increasing applications across a range of disciplines, there is a growing need to improve evaluation efforts to effectively report on the quality, importance, and value of participatory modelling.

Materials and Methods
A scoping review approach was utilized, which involved a systematic literature search via Scopus in consultation with experts to identify and appraise records that described an evaluation framework, criteria and/or process in the context of participatory modelling. This scoping review is registered with the Open Science Framework.

Results
The review identified 11 studies, which varied in evaluation purposes, terminologies, levels of evaluation, and time points. The synthesis of studies highlighted areas of overlap and opportunities for further development, which prompted the development of a comprehensive multi-scale evaluation framework to assess participatory modelling programs across disciplines and systems modelling methods. The framework consists of four categories (Feasibility, Value, Change/Action, Sustainability) with 25 evaluation criteria, broken down across project-, individual-, group-and system-level impacts.

Discussion & Conclusion
The presented novel framework brings together a significant knowledge base into a flexible, cross-sectoral evaluation effort that considers the whole participatory modelling process. Developed through the rigorous synthesis of multidisciplinary expertise from existing studies, the application of the framework can provide the opportunity to understand practical future implications such as which aspects are particularly important for policy decisions, community learning, and the ongoing improvement of modelling methods.  Professor Ian Hickie is the Co-Director, Health and Policy at the Brain and Mind Centre (BMC) University of Sydney. The BMC operates an early-intervention youth services at Camperdown under contract to headspace. He is the Chief Scientific Advisor to, and a 5% equity shareholder in, InnoWell Pty Ltd. InnoWell was formed by the University of Sydney (45% equity) and PwC (Australia; 45% equity) to deliver the $30 M Australian Government-funded Project Synergy (2017-20; a three-year program for the transformation of mental health services) and to lead transformation of mental health services internationally through the use of innovative technologies.

Introduction 27
We live in a complex world with challenges that affect all aspects of our lives. Systems 28 modelling and simulation can improve understanding and management of complex systems 29 challenges. Advances in technology have facilitated accessibility of modelling by diverse 30 stakeholders, allowing them to engage with and contribute to the development of systems 31 models (participatory modelling). However, despite its increasing applications across a range 32 of disciplines, there is a growing need to improve evaluation efforts to effectively report on the 33 quality, importance, and value of participatory modelling. 34

Materials and Methods 35
A scoping review approach was utilized, which involved a systematic literature search via 36 Scopus in consultation with experts to identify and appraise records that described an 37 evaluation framework, criteria and/or process in the context of participatory modelling. This 38 scoping review is registered with the Open Science Framework. 39

Results 40
The review identified 11 studies, which varied in evaluation purposes, terminologies, levels 41 of evaluation, and time points. The synthesis of studies highlighted areas of overlap and 42 opportunities for further development, which prompted the development of a comprehensive 43 multi-scale evaluation framework to assess participatory modelling programs across 44 disciplines and systems modelling methods. The framework consists of four categories 45 (Feasibility, Value, Change/Action, Sustainability) with 25 evaluation criteria, broken down 46 across project-, individual-, group-and system-level impacts. 47

Discussion & Conclusion 48
The presented novel framework brings together a significant knowledge base into a flexible, 49 cross-sectoral evaluation effort that considers the whole participatory modelling process. Systems modelling and simulation, also known as dynamic simulation modelling, is a term 77 given to complex systems science analytic methods -such as system dynamics, Bayesian 78 networks, and agent-based models -that is utilized in many countries and across diverse sectors 79 to support evidence-informed decision making and to drive policy reform. [1,2] By taking a 80 complex systems view, significant challenges in society including population health crises, 81 climate change, poverty, and civil strife can be better understood and managed through 82 computer simulation models that capture the causal structure underlying the dynamics of these 83 systems. [1,[3][4][5][6][7][8] Various systems modelling and simulation techniques have traditionally been 84 applied across a range of disciplines including engineering, business, and environmental 85 sciences for decades, [9] but is now increasingly utilized in other fields including in public 86 health. [10][11][12] This is largely attributed to the utility of systems modelling and simulation 87 providing decision makers with both immediate and long-term support in understanding the 88 prospective impacts of alternative strategic actions, where traditional statistical methods may 89 be limited. [13][14][15][16]  Systems modelling and simulation can provide insights at different levels of scale, including 91 macro, meso, and micro; providing national, state, and local governments with tools that 92 support strategic planning and decision making. [4,[17][18][19][20] As national models can often hide 93 significant regional variations, a key strength of the models developed for local decision 94 making is that they can be contextualized to the complex, local system of interest. [17,21] The 95 need to ground models in local context is giving rise to an increasing commitment to the 96 inclusion of stakeholders in the model building process. [21] 97 Participatory modelling (PM), or stakeholder-based systems modelling, brings together 98 scientific and local expert knowledge. Advances in technology and software have facilitated 99 stakeholders working across a complex system to engage with and contribute to the 101 development of these models. [11,22] For example, graphical model interfaces allow stakeholders 102 to better visualize and understand the logic and assumptions of a model than earlier software 103 that required articulation of a model using mathematical equations or computer coding. Such 104 accessibility has also facilitated the participation of those most impacted by policy changes 105 (such as consumer representatives) -helping to work towards all stakeholders sharing a 106 common understanding of a complex problem or issue, inform and enhance collective action, 107 assist collective decision making processes, enhance both individual and social learning, as 108 well as precipitate changes in stakeholder behaviors. [9,21,[23][24][25][26][27][28][29] 109 Evaluating participatory modelling studies 110 The inclusion of stakeholders to inform the model building process has led to changes in 111 evaluation practices to reflect this participatory approach. [30][31][32] By drawing on diverse 112 stakeholder input, experiences of the quality, importance, and value of PM can be better 113 understood. Common evaluation approaches include formative, summative, process, impact 114 and outcome evaluation. [33] These evaluation approaches can be applied individually or in 115 combination depending on the program or program activity being evaluated. Formative 116 evaluations aim to strengthen the implementation of a program or activity before it is fully 117 implemented, and is typically conducted when the program or activity is being developed or 118 modified; [33][34][35] summative evaluations aim to demonstrate whether the program or activity 119 achieved its intended outcomes to understand its ultimate value, and is conducted towards the 120 end of a program; [34,36] process evaluations, similar to formative evaluations, are embedded 121 as part of implementation, and study the processes or factors related to successful 122 implementation of the program or activity for its intended purpose; [33,35,37] impact 123 evaluations assess whether the immediate outcomes are attributable to the program or 124 By your explanation of PM it seams that the model development or use is the focus. Not a planning/decision-making process that includes (participatory) modeling. If you want a definition that includes both perspectives -see reference [29] activity, and are commonly measured retrospectively. [33,37]

126
At the most basic level, evaluations provide systematic comparisons of program objectives 127 and outcomes to understand how well something is working for the purpose of policy, 128 planning, or implementation. [38,39] According to the Cambridge Dictionary, evaluation is 129 defined as the "process of judging the quality, importance, amount, or value of 130 something." [40] Applying this definition to the context of this paper, there is opportunity to 131 better understand the quality, importance, and value of PMs. [37] This shifts the focus from 132 solely the technical model to a more holistic consideration of the whole PM process, 133 providing opportunity for further knowledge on which aspects of PM are particularly 134 important for policy decisions and community learning, as well as the ongoing improvement 135 of PM methods. [41,42] 136 Evaluators are relied upon to address questions on the effectiveness of investments in local, 137 state, and national programs. [43] There may be various motivations for conducting an evaluation 138 of PM programs including the desire to improve and share knowledge on good practice for PM, 139 quantitatively and qualitatively report on project impacts, as well as to assess the value of PM 140 for future work. [23,37] Evaluations also keep the modellers, funders, and other stakeholders of 141 interest held accountable for demonstrating outcomes, as well as to provide merit to the work 142 being evaluated. [23] Thus, PM program evaluations can also support policy makers to make 143 evidence-informed decisions in determining how much weight to give the program or model 144 outputs. [23] 145

Challenges and opportunities in participatory modelling evaluation 146
There is growing interest in the field of PM evaluation; however, in most cases, evaluation is 147 "lacking or is not based on transparent and systematic methodological approaches." [44] It is 148 Evaliuation can also focus on understanding why (explaining) something works as intended or not. 7 | P a g e also acknowledged that evaluations that comprehensively capture the complex nature of PM 149 can be difficult, [45] as embedding participatory approaches in systems modelling and 150 simulation creates several challenges. [37] For instance, the focus of PM outcomes is often still 151 on the final technical model itself rather than the participatory process used to develop the 152 models. [23] Additionally, previous studies that have attempted to evaluate the benefits of PM PM leads to decreased motivation to conduct thorough evaluations and may also risk 164 evaluation efforts to be overly simplified when measuring the impact of PM, [23] missing the 165 opportunity to assess the performance of PM in different contexts to inform the adaption and 166 improvement of processes. [ 3. Synthesize the findings to develop an evaluation framework that can be adapted and 175 executed in diverse PM programs, regardless of the discipline or modelling method. 176 A scoping review has been deemed the most appropriate approach, compared to a systematic 177 review, as the purpose of this paper is to focus on the broad collection and discussion of 178 available literature, and to present a comprehensive multi-scale evaluation framework for PM 179 programs. [47,48] The development and application of the presented evaluation framework is 180 supported by a participatory systems modelling program for youth mental health (described 181 elsewhere). To our knowledge, this is the first multidisciplinary scoping review of evaluation 182 frameworks for PM programs. 183 184

185
This scoping review was conducted according to the suggested methodology outlined in the 186 [49] in combination 187 with additional recommendations for conducting scoping reviews. [50] The Preferred Reporting 188

Joanna Briggs Institute (JBI) Reviewers' Manual for Evidence Synthesis,
Items for Systematic Reviews and Meta-Analyses (PRISMA) statement was also applied. [51] 189 This review paper has also been registered with the Open Science Framework. [ (TITLE-ABS-KEY ("simulation model*")) OR (TITLE-ABS-KEY ("system* model*")) OR (TITLE-ABS-KEY ("participator* model*")) OR (TITLE-ABS-KEY ("system* dynamic*")) OR (TITLE-ABS-KEY ("agent-based model*")) OR (TITLE-ABS-KEY ("discrete event simulation")) OR (TITLE-ABS-KEY ("Bayesian network*")) OR (TITLE-ABS-KEY ("hybrid simulation")) OR (TITLE-ABS-KEY ("system* science")) OR ( The criteria for inclusion were defined a priori by the authors (GYL, LF) in a Population, 202 Concept, Context format, [49] and applied to all yielded records. As detailed in Table 2, this 203 scoping review included sources that described an evaluation framework, criteria, and/or 204 process for PMs. Though there are varying definitions that exist, for the purpose of this 205 review, we have defined an evaluation framework as a tool that presents an overview of the 206 evaluation theory, topics or themes, questions, and/or data sources; evaluation criteria as a 207 performance metric or indicator that further breaks down the evaluation framework, and; 208 evaluation process as a defined evaluation procedure guided by theory of how the authors 209 recommend PM. [53] 210 Records that presented a standalone theoretical framework (or applied via a case study 211 example) were also included. In contrast, records that only described the methodological 212 tools (e.g., interviews, etc) used to evaluate the implementation of PM programs without 213 describing an evaluation framework, criteria and/or process were excluded. Records that only 214 very good description of the search and selection process 10 | P a g e described the evaluation of a technical model (e.g., not PM) were excluded, as were records 215 that described PM implementation programs without any consideration of evaluations. Date 216 limits were not set, but studies not published in English were excluded from the review. 217 Table 2. Inclusion and exclusion criteria in a Population, Concept, Context format, as 218 recommended by JBI. 219

Inclusion criteria Exclusion criteria Population
Not defined (due to the limited number of PM evaluation frameworks, criteria and/or processes, broadened the 'population' category to not be defined to specific fields of disciplines/population groups). Concept (e.g., PM program methods) Records describing an evaluation framework, criteria and/or process to support the evaluation of a PM program (standalone theoretical framework, or applied in a case study).
Records solely describing the methods adopted to evaluate the implementation of PM programs, without describing an evaluation framework, criteria and/or process.
Records solely describing the evaluation of the technical model (e.g., not PM).
Records describing PM implementation programs not evaluated. Context (e.g., country, setting) Not defined (due to the limited number of PM evaluation frameworks, criteria and/or processes, broadened the 'context' category to not limit any cultural, geographic, or specific setting factors).
Outcomes not published in English.

Data extraction and synthesis 221
Using a pro forma approach, the first author (GYL) independently reviewed the titles and 222 abstracts of all yielded records. Uncertainty whether records met the inclusion criteria were 223 resolved through two-weekly discussions with the senior author (LF). To verify the data 224 extraction, a random sample of 10 records were independently checked by LF. Following this 225 verification process, full text review and data extraction was conducted independently by 226

GYL. 227
To address the first and second objective, a data extraction template was developed by the 228 authors (GYL, LF) and used to collate information on yielded records that underwent full text 229 11 | P a g e review. The four-dimensional framework (4P) developed by Gray et al. was used to set the 230 basis of the data extraction template. The 4P framework is novel as it was developed 231 specifically to standardize the communication of reporting PM programs. [42] This framework 232 has since been adapted to include two additional dimensions by  Prioritizing. [13] The resulting six dimensions (6P) of the adapted framework include: Purpose 234 (why participatory approaches should be adopted in PM); Process (how stakeholders were 235 engaged to collaboratively build the systems model); Partnerships (who the stakeholders 236 Prioritizing (what future priorities were identified as a result the PM process). [13] The 6P 240 definitions were adapted to fit the evaluation objectives of this scoping review, as detailed in 241 Table 3. To ensure an all-inclusive synthesis of records, the JBI template for data 242 extraction [49] as well as additional elements included by the authors were also incorporated 243 into the final data extraction template (Table 3). Once the author (GYL) completed full text 244 review, the senior author (LF) reviewed and verified the final list of records to include for 245 synthesis. 246 Purpose (e.g., why PM approaches should be evaluated) 6P Process (e.g., method utilized to execute evaluation framework/criteria/process) 6P Partnerships (e.g., stakeholders involved in the development of the evaluation framework/criteria/process) 6P Products (e.g., level of evaluation -theoretical, conceptual, implementation) 6P imPact (e.g., outcomes/strengths of the evaluation framework/criteria/process) 6P Prioritizing (e.g., barriers, future opportunities of the evaluation framework/criteria/process) GYL, LF

250
To address the third objective, a narrative synthesis of the findings was conducted and 251 utilized to develop an evaluation framework that can be applied across diverse disciplines and 252 modelling methods. 253

Study selection 255
The initial Scopus search yielded 465 results; an additional 10 records were identified 256 through hand searching, co-author recommendations, and citation chaining. Most articles 257 were excluded from review based on their titles and abstracts (n=451), as the majority only 258 vaguely described the evaluation methods or outcomes of the implementation of a PM 259 program without any reference to an evaluation framework, criteria and/or process. After 260 screening 24 full-text records, 11 studies were included for synthesis. Though it was not 261 intentional, all included records were from academic journals, as opposed to grey literature 262 and conference papers. The PRISMA flow diagram is presented in Figure 1.

Characteristics of evaluation frameworks, criteria and/or processes 279
The papers included for synthesis either described a theoretical evaluation framework and/or 280 criteria with no application to a case study; [23,54] described a theoretical evaluation framework 281 and/or criteria applied to a case study; [24,37,55,58,59] or; described an evaluation process 282 applied to a case study. [56,57,60,61] The majority of the evaluation frameworks, criteria and/or 283 processes were developed by building upon already existing work. [24,37,[55][56][57][58][59][60][61] Only two of the 284 evaluation frameworks described an empirical process of how their frameworks were 285 developed, supplemented with literature reviews. [23,54]  participation; [23,24] and although this was not embedded into the evaluation criteria presented 293 by Zorrilla et al., there was consideration that future work should break down evaluation 294 amongst stakeholder groups from policy makers to farmers. [55] There were two papers that 295 did not explicitly consider the different levels of impact of PM (e.g., project-level impact vs 296 system-level impact). [54,56] Maskrey et al., Falconi et al.,and Hamilton et al. recognized that 297 evaluations should also consider both the immediate and long-term outcomes. [23,37,58]  focusing on the various elements of the system to understand organizational learning, change, 300 and action. [59,60] This information is summarized in Table 5. 301 For the papers 24, 37, 55-61 -Did you look up the references to the "existing work"? The theoretical or empirical grounds for these frameworks should be described there… Strengths. Evaluation framework is flexible to encompass various approaches to PM.
Limitations: The authors of the evaluation framework admitted that the robustness of their evaluation framework has come at the cost of simplificationspecifically, assumption of linear structure of framework.

Zorrilla (2010)
Process evaluation criteria for public participation (and its PM tools), with an emphasis on "what works best when" in the context of water resources management. Evaluation criteria broken down to the participatory process (e.g., improve system understanding, foster trust, etc), as well as capabilities of Bayesian networks (e.g., graphical interface, level of knowledge or uncertainty, etc).

Matthews (2011)
Conceptual evaluation process described that situates outcome evaluation within the wider context of environmental modelling and software activity (EMS) to recognize the differentiation between outcomes (changes in values, attitudes and behavior) and outputs (knowledge mobilized in peer reviewed articles, software, or datasets). The conceptual evaluation process consists of three loosely coupled phases that link EMS research to outcomesresearch, development and operations -in which evaluation plays an integral role across all phases.
Evaluation process built on understanding of the relationship between context, process and outcomes (Blackstock et al., 2007, [71] and; Patton, 1998). [72] Conceptual evaluation process is a generalization of the "consultancy model" for successful Decision Support System proposed by McCown (2002), [73] where knowledge is passed between phases rather than software tools.

Maskrey (2016)
Evaluation framework designed to understand the benefits and limitations of the PM process itself, and assessment of outcomes. Evaluation framework executed via process evaluation (criteria broken down broken down across themes: accessibility, deliberation, representation, responsiveness, satisfaction) as well as outcome evaluation (broken down into substantive outcomes and social outcomes).
Evaluation framework applied in a simple Bayesian network model to exemplify how PM can support local flood risk management contexts in Hebden Water catchment (UK).
Evaluation framework enables the consideration of the process and final outcome (e.g., short-vs long-term outcomes).  et al., 2003 [79] and Carr et al., 2012. [80] Application of evaluation criteria across five distinct case studies: 1) Community-based forest management, Zimbabwe, 2) Shared vision modelling for ACT-ACF water basin, USA, 3) Water management alternatives, USA, 4) Water resource allocation, Solomon Islands, and 5) Regional land-use, Senegal River Delta.  Goeller (1998), [84] and Roughley (2009), [85] with note that on their own, these frameworks are too generic and only relevant to environmental modelling processes.

Strengths. Consideration
Total of 32 evaluation criteria presented, with demonstration of how three common types of research methods -decision support systems, PM and research modelling -should prioritize the criteria for evaluation purposes.
Overview of common evaluation research methodologies presented in Table 4. Research tools are also not provided to guide the adoption of the framework in evaluation practice.
Waterlander (2020) Evaluation process designed to understand how the system evolves under influence of the LIKE programme, which aims to address the complex problem of childhood overweight and obesity in 10-14-year-old adolescents through PM.
Evaluation process will be applied in the LIKE programme employing qualitative and quantitative methods to assess changes in health behavior and body weight that result from the programme and interpret these outcomes in relation to the system changes.  Zare et al. (2020). [93] Evaluation process applied to an integrated assessment of water allocation and use opportunities modelling project in the Campaspe catchment, part of the Murray-Darling Basin in Victoria (Australia), to respond to challenges regarding water availability.
Recognition that monitoring and evaluation processes are an integral activity during all steps/phases of PM to aim for ambitious outcomes and modify activities over time, as needed.
Strengths. Adaptive and flexible monitoring and evaluation process to suit the needs of complex problem solving. Limitations. A general process is described, rather a comprehensive evaluation framework/criteria. The authors have recognized that it would have been advantageous to have the formative and reflective monitoring and evaluation to be part of a complex, participatory project from the outset.

Bold and italics = clear evaluation framework and/or criteria defined, with case study described; Bold = clear evaluation framework and/or criteria defined, with no
302 case study; italics = evaluation process defined, with case study described 303 24 | P a g e Overall, key benefits and areas of future research were identified for each paper through the 304 data extraction process (i.e., imPact and Prioritizing categories of Freebairn's adapted 6P 305 communications framework on reporting PM outcomes). For example, Lynam et al.'s paper 306 was one of the first academic papers that attempted to address the gap in research evidence to 307 support improved evaluation practices in PM. [54] It was also one of the first to identify the 308 need to address power relations when working with communities, as well as the PM process 309 distinct from the technical model (e.g., encouraging co-learning/communication vs level of 310 accuracy/precision). [54] As such, Lynam et al. was referenced by various other papers, [24, 37, 55, 311 58, 59] and was used by Zorrilla et al. as a basis to develop their own evaluation framework. A strength of some of the identified evaluation frameworks, criteria and/or processes was that 316 they were generalizable enough to be applied to other PM programs. [23,24,59] However, for 317 some this came at the cost of oversimplifying the evaluation framework. [24,55,56,61] The 318 strengths and limitations of each individual study are presented in Table 5. Recurring themes 319 were synthesized from across the papers utilizing Freebairn's adapted 6P communications 320 framework for reporting on PM programs. [13] The themes are presented in Table 6. 321 Table 6. Recurring themes applied to Freebairn's adapted six-dimensional (6P) 322 communications framework that standardize the reporting of PM studies. 323

Six-dimensional reporting criteria Synthesized evaluation themes from included studies
Purpose (e.g., why PM approaches should be evaluated) To develop an evaluation framework and/or criteria for PM programs and/or tools.
The aim differed across studies, but ranged from evaluating the: success of PM programs with consideration of participatory processes; [23,24,37,58,61] The ability of the framewor k to work not only for the specific PM program evaluate d is a key criteria.

324
Challenges and risks identified 325 It was evident from the synthesis process that differing terminologies, approaches, and 326 assumptions were used, which led to challenges in determining the best evaluation framework 327 to adopt. This poses risk that evaluation processes will not reach their full potential, which has 328 implications for funders, participating stakeholders, as well as modellers. [94] Additionally, it 329 was evident during the synthesis process that studies either described a comprehensive 330 evaluation framework, criteria and/or process, or they focused on the actual evaluation 331 methodologies; the two rarely coincided. [95,96] Therefore, a comprehensive evaluation 332 framework and criteria are needed for PM programs that have theoretical and empirical 333 underpinnings but are also accompanied by practical evaluation tools and methods to support 334 real-world implementation. 335

Scope of synthesis and development of comprehensive multi-scale evaluation framework 336
To analyze the heterogeneity in terminology identified during the data extraction process for 337 the studies included, a word cloud was generated aligned to Hearst et al.'s recommendations 338 on developing effective word clouds to improve reader understanding ( Figure 2). [97] Word 339 clouds visually display the most frequently used words in a body of text; the bigger font size 340 of a word illustrates that this word is used more frequently. [98] To ensure that a focused word 341 cloud was generated specific to evaluation, the authors first uploaded the full text of all 11 342 studies included for synthesis. A process of elimination was performed, whereby words that 343 were not related to evaluation -such as university, platform, and various stop words -were 344 deleted. Following this process, the authors went through the remaining list of words and 345 merged synonyms as well as the same words presented in its singular or plural form or with 346 tense variation. 347 There is a problem with the idea to find or define the "best" evaluatio n framewor k. See comment above. sustainability, Thus, these four terms were identified as the evaluation framework categories 359 (highlighted in yellow, Table 7). The remaining 35 terms (highlighted in grey, Table 7) have 360 been incorporated into the evaluation criteria, or evaluation questions. The word level was 361 neither incorporated as an evaluation framework category nor criteria; but as the various 362 levels of evaluation (e.g., project-level vs system-level) were noteworthy across studies, this 363 term was included as a separate component in our evaluation framework (Table 8). 364

369
Discussion 370 This scoping review identified 11 studies that described an evaluation framework, criteria 371 and/or process developed for PM programs. From the synthesis of these papers, the strengths 372 and limitations, as well as overlapping concepts and themes were synthesized to present a 373 comprehensive multi-scale evaluation framework (Table 8)  Sustainability. It is recommended that comprehensive evaluation processes need clear criteria 380 to set appropriate benchmarks; [55,71] therefore, the authors developed 25 criteria, which include 381 all key words identified from the word cloud (Table 8). 382

Describe this in the method section…
Parts of the discussion feels like a result -the description of your evaluation framework. E.g. the meaning of the categories and how the words are used to formulate criteria (?) Where are the criteria? You mean the questions? 29 | P a g e As an evaluation concept, the feasibility or plausibility of PM allows for questions to be 383 asked regarding whether it was possible for all participants to engage and contribute 384 throughout the PM process. Consideration of the value of the PM process allows for the 385 exploration of questions regarding what was gained due to engaging participants in PM (e.g., 386 learning, confidence, trust). Change & action facilitates observations of impact, including ex 387 ante and ex post comparisons of stakeholder relationships, knowledge, and behaviors as a 388 result of the PM process; sustainability allows for the observations of these impacts over time 389 ( within the individual-and group-levels (i.e., client vs decision makers), this was not made 399 explicit in their evaluation framework. Jones attempted to do this by separating out evaluation 400 methods for the project team and the stakeholder group, [24] and we further propose 401 examination across stakeholder groups who may participate in the PM process (e.g., 402 workshops, meetings, etc). This is critical as PM processes are inclusive and involve 403 stakeholders from diverse backgrounds. Consideration of the sublevels of participation 404 enables the recognition of, for example, potential power relations and dynamics amongst the 405 stakeholders, to be able to improve PM design and appropriately measure outcomes. [23, 24, 54-406 56, 59] This has been reflected in our evaluation framework, presented in Table 8, with the 407 30 | P a g e individual-and group-levels further stratified to include community participants (e.g., 408 consumer representatives) and professional participants (e.g., policy makers). 409 How did the PM process add value (e.g., context, validity, learning) to developing the systems models?
How was feedback considered throughout the program to improve the PM process (including the build of the systems model)?
Was the PM process flexible enough to take action/respond to the changing needs of the complex system?
How does the PM process promote sustained use of the systems model?

INDIVIDUAL-LEVEL IMPACT
Community Participants (e.g., consumer representatives) How do community participants view the credibility of the PM process?
What are the experiences arising from the application of PM to community participants (e.g., outcomes, salience, ability to share their story)?
Are there changes in perceived knowledge, beliefs, behaviors, or assumptions for participants?
Are there changes in the way participants engage with the system (e.g., reflection)?
Are there sustained changes in knowledge, beliefs, behaviors and/or assumptions for participants? Professional Participants (e.g., policy makers) How do professional participants view the credibility of the PM process?
How do professional participants view the credibility of the evidence used to effectively inform the systems model?
What are the experiences of professional participants using the systems model (e.g., confidence using the tool, ease/simple to use, salience, acceptance)?

Community Participants
How did the community participants contribute and engage during the PM process?
What are the experiences (e.g., benefits and challenges) working in collaboration with professional How were power relationships managed?
How did professional participants ensure that PM processes were inclusive, accessible, and transparent?
What are the experiences (e.g., benefits and challenges) working in interdisciplinary collaboration with community participants and/or other professional participants during the PM process (e.g., communication, relationships, trust, social networks)?

SYSTEM-LEVEL IMPACT
Can systems models be built through a participatory approach that can effectively inform policy, planning, and investment decisions with a degree of confidence in accuracy to address complex systems challenges?
Does the participatory approach in building systems models add sufficient value to warrant the time and resources investment (e.g., improve capacity/efficiency, confidence)?
How have insights from the PM process been applied in the complex system of interest?
What are the factors that have influenced the extent to which the systems model has been utilized?
How have insights from the systems models been applied in the longer term?
How do participants' engagement with and use of the systems model change over time?
What are the longer-term factors that have influenced the extent to which the systems model is ongoingly utilized to inform policy, planning, and investment decisions? 411 33 | P a g e The principles of Participatory Action Research (PAR) underpin the proposed evaluation 412 framework ( Figure 3). PAR aims to improve outcomes and reduce inequities by working with 413 the people who systems model most affects, such as consumer representatives. [99] The PAR 414 approach is appropriate in the context of PM as the traditional roles of the modellers as the 415 experts and stakeholders as the study participants are challenged. [100,101] In the PM process 416 stakeholders are invited to contribute their expertise and experience about their understanding 417 of the system of interest, making both participatory modellers and stakeholders equal and active 418 participants. [99] PAR also embeds reflection during all phases of the PM program and can lead 419 to shared learning and joint action for change to improve PM processes. [21,22]  The studies included for synthesis (Table 5, Table 6) used a variety of methods to collect 426 evaluation data. It is recommended that the presented evaluation framework adopts a mixed 427 methods approach to align with the PM process. Examples of the potential methods include 428 semi-structured interviews, surveys, journey maps and social network analysis (Figure 4). A 429 more thorough description of how the presented evaluation framework can be executed through 430 a mixed-methods approach, including the tools the authors have developed as well as the 431 Evaluations have the potential to measure change at the project, individual, group, and system 439 (policy) level. [23,102,103] Careful thought on design aspects are needed to ensure that evaluations 440 are worthwhile as they require additional time, resources, and funding. [104] The presented 441 evaluation framework considers the contributions of all participants involved in the PM process, 442 not only the perspectives of the modellers or funder. [104,105] 443 The presented evaluation framework is also designed to be adaptive, flexible and iterative, to 444 ensure that the framework remains relevant despite the evolving field and contexts in which 445 PM are being applied. [9] The proposed evaluation framework builds on principles of PAR to 446 empower stakeholders from various backgrounds (e.g., community participants to 447 professional participants), and embeds ongoing reflection and learning so that the PM process 448 can respond to the changing needs of complex systems, as well as be applied across 449 disciplines and diverse modelling methods. [21] The presented evaluation framework supports 450 the application of a mixed-methods approach with an emphasis on approaching PM 451 evaluations holistically. [106] 452 There are limitations to this scoping review that should be acknowledged. The heterogeneity 453 in terminology was a challenge during the screening, data extraction, and full text review 454 process. However, with the process described in which the first author and senior author 455 worked closely to resolve any ambiguity, a robust method was followed to ensure that the 456 studies included are most relevant for the purposes of this scoping review. Additionally, though 457 the described search strategy was broad in that it did not set any limits to the field of study, it 458 was narrow so that only the studies that disclosed an evaluation framework, criteria and/or 459 process in a PM context were included. 460 Evaluation can do many other things as well. E.g. provide understanding of the process and what parts of it that has a certain effect. So that the knowledge can be transferred more easily since every process is dependent on context. Before-after-measurements will not be so helpful in that sense.
Yes, but the description of how these principles are translated into evaluation activities, more practically, is mainly lacking.

461
Evaluations are an integral component of the PM process that should be carefully considered 462 throughout, and not viewed as its own separate component or afterthought. With the ability to 463 inform policy change by demonstrating the measured effectiveness of PM, such processes 464 should be adequately supported with an appropriate evaluation design. The presented 465 framework describes a multi-scale and comprehensive, yet flexible evaluation approach that 466 is built on the rigorous synthesis of strengths and opportunities for further development 467 identified from existing studies. This framework enables the conduct of holistic evaluation 468 practices by considering the project-, individual-, group-, and system-level impacts to 469 understand the feasibility, value, impact, and sustainability of the PM process. Outputs from 470 adopting such an evaluation approach, underpinned by principles of PAR, can be used to 471 guide ongoing improvements to the PM process, empower stakeholders and users of systems 472 models to be more confident in the model outcomes, as well as to improve understanding of 473 which aspects of PM are particularly important for policy decisions. 474 475